Lecture #11 


Sorting Algorithms, part IT: 

- Quicksort 

- Mergesort 

Trees 

- Introduction 

- Implementation & Basic Properties 
- Traversals: The Pre-order Traversal 
On-your-own Study 

- Full binary trees 


But first... STL Challenge 


Give me a data structure that I can use to maintain a bunch of people's names and for 
each person, allows me to easily get all of the streets they lived on. 


Assuming I have P total people and each person has lived on an average of E former 
streets... 


What is the Big-Oh cost of: 


Finding the names of all people who have lived on "Levering street"? 

Determining if "Bill" ever lived on "Westwood blvd"? 

Printing out every name along with each person's street addresses, in alphabetical 
order (names and addresses in alpha-order). 

D. Printing out all of the streets that "Tala" has lived on. 
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GOOD CODEES... 


OK, WE'VE CHANGED ">" TO 
">=". BUT THAT DOESN'T 
WORK EITHER. 

AND NOW? 
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-- KNOW WHAT THEY'RE DOING 
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Divide & Conquer Sorting Algorithms 
What's the big picture? 


Quicksort and Mergesort are efficient 
"divide and conquer" sorting algorithms. 


They generally work as follows: 


Divide the elements to be sorted into 
two groups of roughly equal size. 

2. Sort each of these smaller groups of 
elements (conquer) using recursion. 

3. Combine the two sorted groups into 

one large sorted group. 


These sorts generally require O(N*log2(N)) steps. 


Divide and Conquer Sorting 


The last two sorts we'll learn (for now) are 
Quicksort and Mergesort. 


These sorts generally work as follows: 


1. Divide the elements to be sorted into two 


groups of roughly equal size. 

2. Sort each of these smaller groups of elements 
(conquer). 

3. Combine the two sorted groups into one large 
sorted list. 


Any time you see “divide and conquer," you should think 
recursion... EEK! 


Divide 


Conquer 


The Quicksort Algorithm 


1. If the array contains only O or 1 element, return. 


2. Select an arbitrary element P from the array 


(typically the first element in the array). 


3. Move all elements that are less than or equal to P to 


the left of the array and all elements greater than P 
to the right (this is called partitioning). 


4. Recursively repeat this process on the left sub-array 
and then the right sub-array. 


13, 1 2130694077 


Select an arbitrary item P from the array. 

Move items smaller than or equal to P to the left and 
larger items to the right; P goes in-between. 
Recursively repeat this process on the left items 
Recursively repeat this process on the right items 


| e EA 
EE MBA History Bio — Drop-out CS “USC 
Major Major Major Major X Grad 


After the first 


Drop-out 


History 
Major Major Grad Major 


QuickSort 


The top row of piles is the 
initial configuration before any 
sorting has taken place. 

The second row of piles is after 
weve selected EE major as our 
“arbitrary pile" P and moved all 
shorter or equal-height piles to 
the left, and all taller piles to 
the right 

Notice that while the second 
row is not fully sorted, pile P 
(EE major) is actually in the 
right place - it never needs to 
be moved again. 

Why? Because every pile left of 
P is smaller than P, and every 
pile than it is greater than it. 
So P is in the perfect position - 
the position it'll be in once 
everything is completely sorted. 
This means that we can 
independently sort the left 
three piles, then independently 
sort the right three piles, 
leaving P as-is. 

And then everything will be 
sorted! 


QuickSort 


Bio | 
Major Major Grad Major 
Everything left of EE Major 


(our first P) is now sortedl ° This slide shows us recursively sorting the left three 

| piles. 

° Again, we pick an arbitrary pile P (in this case, History 
major) and then move everything less than or equal to 
the left, and everything taller to the right. 

* Since there are only three items, this results in the 
left part of the array being sorted! 

e But if there were more items, we'd repeat this process 
over and over 


Histor 
Major 


QuickSort 


vy | ES Drop-out is 5 Major 


Grad ^ Major Major 


Finally, all items are sorted! 


1р24405 
mou S! (d 4521) uno) 4ofow 
33 40 +4б!ч buiu4 A423 


History 
Major 


CS Major MBA Drop-out 


10 


First specifies the Last specifies the 


starting element of the Ч | С ks O 


Only bother | array to sort. 


array to sort. 


last element of the 


sorting arrays 


of at least two And here's an ac icksort C++ function: 
elements! O 7 
b. А | 
voi uickSort (int Array[],int First,int Last) 
{ 
if (Last - First >= 1 ) DIVIDE 
CONQUER ( Pick an element. 
Apply our QS І I Move <= items left 
algorithm to — шз Move > items right 
the left half of 3 
the array. 
PivotlIndex = Partition(Array,First,Last); 
uickSort(Array,First,PivotIndex-1); // left 
QuickSort(Array,PivotIndex-*l,Last); // right 
/ CONQUER N 
Apply our QS 
ШОШ 13 1 2130694077 46 
the right half 01 234 5 6 7 


N. of the array. / 


The QS Partition Function 


The Partition function uses the first item as the pivot 
value and moves less-than-or-equal items to the left and 


larger ones to the right. 


{ 


int Partition(int a[], int low, int high) 


int pi = low; 


int pivot = a[1ow];) - Select the first item as our pivot value 


do 
{ 
while ( low <= high && a[low] <= pivot ) 


Іом++ ; 
while ( a[high] > pivot ) 
high--; 


if ( low « high ) 


Find next value on the 
)- left that is » than the pivot. 


= Find next value on the right <= than the pivot. 


swap(a[low], a[high]); ) - Swap the two out of place items 


} 

while ( low < high ); | I 
swap(a[pi], a[high]); )- Swap our pivot into 
pi = high; 


the right spot 


return (pi) ; } - Return the slot # of our pivot item in the array 


Big-oh of Quicksort 


We first partition the peters 
array, at a cost of n steps. 


1 21306940 77 46 


Then we repeat the 
process for each half... 


n steps 
We partition each of the 2 
halves, each taking n/2 steps, 1 13 21 40 46 69 77 
at a total cost of n steps. 

n steps 

Then we repeat the 
process for each half... 1 21 40 46 77 
We partition each of the 4 So at each level, we do n 


halves, each taking n/4 steps, 


operations, and we have Іодг(п) 
at a total cost of n steps. 


levels, so we get: n Іодг(п). 


~ Quicksort - Is It Always Fasf? 


Are there any kinds of input data where Quicksort is 
either more or less efficient? 


Yes! If our array is already sorted or mostly sorted, 
then quicksort becomes very slow! 


—> 


10 20 30 40 50 60 70 


Let's see why. 


Worst-case Big-oh of Quicksort 


We first partition the array, at _ steps ^0, 
a cost of n steps. Г] EEE а 


Then we repeat the process 


for the Téa right groups... .  hdsteps ^^, 


ЕЕЕ гов; m 1020 304050 60 70 
group then. 


Then we repeat The process 


for the lef right groups... | [го 30 40 50 60 70 


When an array is already sorted, the smallest item will always be on the left 

So if we choose the first item as the pivot P, after our partition alg. P will stay all the way on the left! 
So rather than having roughly half the array moved left of the pivot P and half on the right side as we 
saw in our example with piles of cash, we'll have N-1 items on the right side of P and zero to its left! 

So now when we do recursion on the left side there's nothing to do, since there are zero items less than 
the pivot P to sort... 

And when we do recursion on the right side, we have N-1 items still to sort. 

So to fully sort the array, we have to recurse down N-1 levels deep!! 


Worst-case Big-oh of Quicksort 


What you'll notice is that n steps 
each time we partition, we 


remove only one item off the 1 10 20 30 40 50 60 x 70 
left side! 


And if we only remove n-1 steps 
one item off the 


left side each time... 10 20 30 40 50 60 70 
We're going to have to go 
through this partitioning 


n-2 steps 
process n times to process 
the entire array! 


20 30 40 50 60 70 
And if the partition algorithm / 
requires ^n steps at each level... 

And we go n levels deep... 


Then our algorithm is O(n*)! J eee 


n-3 steps 


Other Quicksort Worst Cases? 


So, as you can see, an array that's mostly in order 
will require an average of N? steps! 


AS you can probably guess, Quicksort also has the 
same problem with arrays that are in reverse order! 


So if you happen to know your data will be 
mostly sorted (or in reverse) order, avoid Quicksort! 


It's a DOG! 


QuickSort Questions 


Can QuickSort be applied easily to 
sort items within a linked list? 


Is QuickSort a "stable" sort? 


Does QuickSort use a fixed amount 
of RAM, or can it vary? 


Can QuickSort be parallelized across 
multiple cores? 


When might you use QuickSort? 
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Mergesort 


The Mergesort is another extremely efficient sort - yet 
it's pretty easy to understand. 


But before we learn the Mergesort, we need to learn 
another algorithm called "merge". 
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Mergesort 


The basic merge algorithm takes two-presorted arrays as 
inputs and outputs a combined, third sorted array. 


Merge Algorithm 

Consider the left-most book in both shelves 

Take the smallest of the two books 

Add it to the new shelf 

Repeat the whole process until all books 
are moved 


j 1. Initialize counter variables il, i2 to zero 
i2 2. While there are more items to copy... 
If Allil] is less than A2[i2] 
By always selecting and moving Copy A1[i1] to output array B and il++ 
the smallest book from either Else | | 
shelf we guarantee all of our Copy A2[i2] to output array B and i2++ 
books will end up sorted! 3. If either array runs out, copy the entire 
contents of the other array over 
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Merge A Igorithm In C++ Here's the C++ version of our 


void merge(int data[], int nl, int n2, 


int temp[]) 


{ 
int 11=0, i2=0, k=0; 


int *Al = data, *A2 = data + nl; 


while (il « n1 || i2 « n2) 
{ 
if (il == nl) 
temp [k++] = A2[i2++]; 
else if (i2 == n2) 
temp [k++] = А1[11++]; 
else if (data[il] <= A2[i2]) 
temp [k++] = А1[11++]; 
else 
temp [k++] = A2[i2++]; 
} 
for (int i=0;i<nl+n2;i++) 
data[i] = temp[i]; 


merge function! 


You pass in an input array called 
data and the sizes of the two 
parts of it to merge: nl and n2 


The last parameter, temp, is a 

temporary array of size n1«n2 

that holds the merged results 
as we loop. 


Finally, we copy our merged 
results back to the data array. 


14 111 13 21125 |30... 
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Mergesort 


OK - so what's the full mergesort alogrithm: 


Mergesort function : 


1. If array has one element, then return (it's sorted). 
2. Split up the array into two equal sections 

3. Recursively call Mergesort function on the left half 
4. Recursively call Mergesort function on the right half 
5. Merge the two halves using our merge function 


It's difficult to show mergesort visually in static slides, so if you want 
to see it in action, download my PPT slides: lecture11-updated.pptx 
on www.careynachenberg.com 


Big-oh of 
Mergesort 


This is visually how mergesort 
divides its piles. 

It divides the initial array in half, 
then recursively calls itself on each 
half to sort them, then merges the 
sorted two piles into one big pile 
Of course each of those halves is 
further broken in half, and passed 
to another recursive call, and so on. 
This breaking in half happens until 
we reach a single book as we see in 
the bottom row. 

Then we merge the sorted piles on 
the way back up 

We start by merging just two 
books, one book from the left pile 
and one book from the right pile 
(see the bottom row) 

At the next level up well merge two 
books from the left pile with two 
books from the right pile 

Then up a level we'll merge four 
books from the left pile and four 
from the right pile 

And so on... 
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n items merged 


i 


Big-oh of Mergesort 


* Note that if there are N total 
values to sort, we'll keep breaking 
the array in half until we get 
arrays of just 1 value each. 

° — That will be logan levels deep, 
which is the # of times we can 
divide N by two until we get to 1. 


°  Onthe way back up, we merge 
each of the arrays. 

° On each row, we merge N total 
values (it's O(N)). 

. That might not be obvious, but 
it's what happens. 

° On the bottom row, we merge 
N arrays of 1 value each 
together into N/2 arrays of 
two values each. That's O(N) 
steps 

° On the second-to-last row, we 
merge N/2 arrays of 2 values 
each together into N/4 arrays 
of four values each. That's also 
O(N) steps 

° And so on, until we merge the 
top two arrays of size N/2 into 
a single array of size N. That's 
also O(N) steps 

° . Solog(N) levels of O(N) 
merges per level is N*logN 


i Big-oh of 
Mergesort 


logon levels deep 


< 
келе 
— 
55 
аъ 
=s 
R 
= 
= 
= 
SS 


| Why? Because we 
n items merged keep dividing our 
piles in half... 


until our piles are 
just 1 bookl 


— _—_ —_ ——— 


n items merged 


Overall, this gives us п-Іодг(п) steps to sort 
n items of data. Not bad! © 
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Mergesort - Any Problem Cases 


So, are there any cases 
where mergesort is less 
efficient? 


No! Mergesort works 

equally well regardless 

of the ordering of the 
data... 


However, because the merge function needs secondary | 
arrays to merge, this can slow things down a bit... 


In contrast, quicksort doesn't need to allocate any new 
arrays to work. 


MergeSort Questions 


Can MergeSort be applied easily to 
sort items within a linked list? 


Is MergeSort a "stable" sort? 


Are there any special uses for MergeSort 
that other sorts can't handle? 


Can MergeSort be parallelized across 
multiple cores? 


Stable/ 


Non- 


Sorting Overview 


Selection 
Sort 


Insertion 
Sort 


Bubble 
Sort 


Shell 
Sort 
Quick 
Sort 


Merge 
Sort 


Heap 
Sort 


stable 
Unstable 


Stable 


Stable 


Unstable 


Unstable 


Stable 


Unstable 


Always O(n?), but simple to implement. Can be used with linked lists. 
Minimizes the number of item-swaps (important if swaps are slow) 


O(n) for already or nearly-ordered arrays. O(n?) otherwise. Can be 
used with linked lists. Easy to implement. 


O(n) for already or nearly-ordered arrays (with a good 
implementation). O(n?) otherwise. Can be used with linked lists. 
Easy to implement. Rarely a good answer on an interview! 


O(n^?5) approx. OK for linked lists. Used in some embedded 
systems (eg, in a car) instead of quicksort due to fixed RAM usage. 


O(n logzn) average, O(n?) for already/mostly/reverse ordered arrays or 
arrays with the same value repeated many times. Can be used with 
linked lists. Can be parallelized across multiple cores. Can require up 
O(n) slots of extra RAM (for recursion) in the worst case, O(logzn) avg. 


O(n Іодгп) always. Used for sorting large amounts of data on disk 
(aka "external sorting"). Can be used to sort linked lists. Can be 
parallelized across multiple cores. Downside: Requires n slots of 
extra memory/disk for merging - other sorts don't need xtra RAM. 


O(n logon) always. Sometimes used in low-RAM embedded systems 
because of its performance/low memory req'ts. 
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Challenge Problems 


1. Give an algorithm to efficiently determine 
which element occurs the largest number of 
times in the array. 


2. What's the best algorithm to sort 1,000,000 
random numbers that are all between 1 and 5? 
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when your code is meant to be 
O(NlogN) but it's been 30 
minutes and it still hasn't finished 
N=3 


*breath in* 
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Tree Data Structures 
What's the big picture? 


A tree is a data structure that stores values 
in a hierarchical fashion, e.g., 


We often use linked lists to build trees. For 
instance, the tree above has nodes with two 
"hext" pointers - one going left and one right. 


Trees are an alternative to linked lists and 
arrays when you need more organization of 
your data. 
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Trees 


I think that I shall never see a data structure as lovely as a Tree.” - 
Carey Nachenberg 


A Tree is a special linked list-based data structure 
that has many uses in Computer Science: 


* To organize hierarchical data 


A Family Tree 
* To make information easily 
searchable 
* To simplify the evaluation of "leon" andreg" 
mathematical expressions 


* To make decisions Z x 


"sheila' l'simon'| Imartha" 'miltoq' 
A Binary Search Tree An Expression Tree 


A Decision Tree 
Does the 
patient 
have a fever? 

yes 
* 


no 


Does he/she Does he/she 
have spots on have a sore 
his/her face? throat? 
ye \ / N 
"alan" || jacob! nancy" || | "zai" 32 m -42 4 
мачи 


He has 
COOTIES! 


Basic Tree Facts 


1. Trees are made of nodes root ptr root ptr 
(just like linked list nodes). Empty tree) NULL 
2. Every tree has a "root" pointer. 
3. The top node of a tree 5 
is called its "root" node. Root node 
4. Every node may have zero "^ 
or more "children" nodes. 
5. A node with O children is 
called a "leaf" node. -33 17 
6. A tree with no nodes is / N NULI 
called an "empty tree." 
struct node Leaf node 53 91 -115 
í NULI NULI 


int value; // some value 


node *left, *right; 
); 


But instead of just one next 
pointer, a tree node can have 
© two or more next pointers! 


| The tree's root pointer 
is like a linked list's 


node *rootPtr; 


k. 


head pointer! 


| 
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Tree Nodes Can Have Many Children 


A tree node can have more than just two children: 


struct node 


{ 


J; 


int value; // node data 


node *pChildl, *pChild2, *pChild3, ...; 


struct node 


{ 


); 


int value; // node data 


node *pChildren[26] ; 


root ptr 


NUL 


7 4 


15 


mm 


Binary Trees 


A binary tree is a special form of tree. In a binary tree, 
every node has at most two children nodes: 


A left child and a right child. 


struct BTNODE // binary tree node ! 
( A Binary Tree 


string value;  // node data 


BTNODE *pLeft, *pRight; 
); 


It's important to note that not every binary tree is 
a binary search tree. 

For instance, the tree to the right is a binary tree 
but NOT a binary search tree. 

that each node has twochildren nodes. ^. I Sheila [simon] martha} milton 
In contrast, a binary SEARCH tree is a binary tree 
where the organization of the nodes follows 

certain ordering rules. 


I 
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Binary Tree Subtrees 


We can pick any node in the tree... 


And then focus on its “subtree” - which includes it and 
all of nodes below it. T sien. Y 


includes four 
different 
nodes... 


It has the 
"leon" node 


q "m as its root. 
like this node... leon V á 


“sheila” "simon" 


| P a“ 
ZIggY 


NULII NUL 


Binary Tree Subtrees 


If we pick a node from our tree... 
we can also identify /7s left and right sub-trees. 


like this node... 


f andrea" 


' Sheila" | martha" “milton” 


"2199у" 


Operations on Binary Trees 


The following are common operations that we might 
perform on a Binary Tree: 


: enumerating all the items 

e searching for an item 

- adding a new item at a certain position on the tree 
- deleting an item 

- deleting the entire tree (destruction) 

- removing a whole section of a tree (called pruning) 
• adding a whole section to a tree (called grafting) 


Well learn about many of these operations over the 
next two classes. 
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struct BTNODE // node 
{ 

int value; // data 
BTNODE *left, *right; 
}; 


main () 


{ 
BTNODE *temp, *pRoo 


pRoot = new BTNODE; 
pRoot->value = 5; 


temp = new BTNODE; 
temp->value = 7; 

temp->left = NULL; 
temp->right NULL; 
pRoot->left temp ; 


temp = new BTNODE; 
temp->value = -3; 
temp->left = NULL; 
temp->right = NULL; 
pRoot->right = temp; 
// ete... 


/^ Aswith linked > 


lists, we use 
dynamic memory 
to allocate our 


\ nodes. 


A Simple Tree 
Example 


d 


temp MEN 


pRoot 


value 45 


left right 


1000 


temp-> 


value, 7 


left right 
NULL NULL 


lá 1200 | 1100 


1200 


1100 


temp-> 


value -3 


left right 
NULL NULL 


And of course, later we'd have 
to delete our tree's nodes. 


We ve created a binary tree... 
now what? 


Now that we've created a 
binary tree, what can we 
do with it? 


Well, next class we'll learn 
how to use the binary tree to 
speed up searching for data. 


But for now, let's learn how 
to iterate through each item 
ina tree, one at a time. 


This is called “traversing” the 
tree, and there are several 
ways to do it. 
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Binary Tree Traversals 


When we iterate through all the nodes 
in a tree, it's called a traversal. 


Any time we traverse through a tree, we 
always start with the root node. 


v 


There are four common ways to 
Traverse a tree. "d" e" 
NUL 


Each technique differs in the order that each node 
is visited during the traversal: 


1. Pre-order traversal 


2. In-order traversal 
3. Post-order traversal 
4. Level-order traversal 
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The Preorder Traversal 


(By process, we mean things like... ^. 


* Print the node's value out 
* Search the node for a particular value 
* Add the node's value to a total 


4 


PreOrder(node): ө) 
. Process the current node. 


Recursively call PreOrder on the 


left sub-tree. 
Recursively call PreOrder on the 
11 sub-tree. 


° The PreOrder traversal is a recursive traversal that processes all of the nodes in a tree. 

* Can you guess why it's called a "pre-order" traversal? 

° Because at each node, we pre-process the current node before processing the node's left 
and right subtrees. 

° So, for example, when we start at the "Eat" node, we process "Eat" first, then process the 
"Rats" subtree in its entirety, then process the "Are" subtree in its entirety. 

° And the algorithm is asked to process the "Rats" node, it processes it first, then process 
the "For" subtree in its entirety, then process the "They" subtree in its entirety 

° So the order the nodes would be processed by a pre-order traversal would be: 

° Eat, rats, for, they, are, tasty, treats 
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The Pre-order Traversal usc’ root 
Output: USC kids have no clue 


py: "u и и 
* Below we see the PreOrder function - look how simple it is! kids clue 
e The first line "if (cur == nullptr)" checks for the base case. If 
we are passed an empty tree/subtree, then we just return and 


do nothing. This is a super-common pattern for tree-based 


recursion. Always include a check for nullptr. "have" "по" 
. и process the current node's value, in this case, printing “ҮТ 
it ou 


* Finally, we recursively call ourself on the left child of the 
current node (the root of the left subtree) 

* When that's done, we recursively call ourself on the right child 
of the current node to process the right subtree. 


void PreOrder(Node *cur) 


b | main() 
if (cur == nullptr) // if empty, return... 
FERMO => Node *гоої; 
cout << cur-»value; // Process the current node. 


PreOrder(cur->left); // Process nodes іп left sub-tree. | 
PreOrder(cur-> right); // Process nodes in right sub-tree. “ tic 


} 


